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Abstract — Distributed network utility maximization (NUM) 
has received an increasing intensity of interest over the past 
few years. Distributed solutions (e.g., the primal-dual gradient 
method) have been intensively investigated under fading channels. 
As such distributed solutions involve iterative updating and ex- 
plicit message passing, it is unrealistic to assume that the wireless 
channel remains unchanged during the iterations. Unfortunately, 
the behavior of those distributed solutions under time-varying 
• channels is in general unknown. In this paper, we shall investigate 
' the convergence behavior and tracking errors of the iterative 
primal-dual scaled gradient algorithm (PDSGA) with dynamic 
scaling matrices (DSC) for solving distributive NUM problems 
under time-varying fading channels. We shall also study a specific 
application example, namely the multi-commodity flow control 
and multi-carrier power allocation problem in multi-hop ad hoc 
networks. Our analysis shows that the PDSGA converges to a 
[ limit region rather than a single point under the finite state 
■ Markov chain (FSMC) fading channels. We also show thatjhe 
'order of growth of the tracking errors is given by O (T/N), 
. where T and A*" are the update interval and the average 
sojourn time of the FSMC, respectively. Based on this analysis, 
we derive a low complexity distributive adaptation algorithm 
[ for determining the adaptive scaling matrices, which can be 
t implemented distributively at each transmitter. The numerical 
> results show the superior performance of the proposed dynamic 
' scaling matrix algorithm over several baseline schemes, such as 
[ the regular primal-dual gradient algorithm. 

Index Terms — Distributed Network Utility Maximization, 
Primal-Dual Scaled Gradient Algorithm, Time-Varying Chan- 
nel, Region Stability, Tracking Error Analysis, Tracking Error 
' Optimization 



I. Introduction 

The Network Utility Maximization (NUM) framework lias 
[ been widely adopted for network resource allocation problems 
' in wireline and wireless networks over the past few years. 
I In such NUM problems, the network control problem is 
• formulated as an optimization problem, in which the utility 
function of the network (e.g., efficiency, user satisfactory, 
etc.) is to be maximized, under some resource constraints 
(e.g., power constraint, link capacity constraint, etc.) |[T|-13- 
There are a lot of works that focus on solving the NUM 
problems (e.g., see lH for a comprehensive survey). Based on 
the primal-dual decomposition framework ||2l, |[3], distributive 
algorithms such as the dual-gradient algorithm |[T]-||71 and the 
primal-dual gradient algorithm ISll- lfTOl . have been proposed. 
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These distributive algorithms are quite useful for practical 
networks, which consist of randomly placed nodes and lack 
of centralized coordination. However, the implementation of 
the distributive algorithms involves iterative solutions with 
explicit message passing among nodes, which means that the 
algorithm may not converge within a negligible time especially 
in the wireless networks. In the existing literature |[T|-||2l, the 
convergence and the optimality properties of these distribu- 
tive algorithms are established under the quasi-static fading 
channel assumption, i.e., the channel coefficients between any 
pair of nodes in the network are assumed to be constant 
during the updates of the iterative algorithms. However, since 
the wireless channel is time-varying by nature and explicit 
message passing between network nodes is required for each 
iteration, it is a limiting assumption to assume that the chan- 
nels remain unchanged during a significant number of iterative 
steps before the algorithm converges ifTTI . As a result, it is of 
great importance both theoretically and practically to study 
the behavior of the distributive algorithms under time-varying 
channels. In this paper, we are interested in the behavior of 
the generalized primal-dual scaled gradient algorithm ISl- 
ifTOl for solving the distributed NUM problem in multihop 
wireless networks under time-varying channels (e.g., finite- 
state Markov channel). For such multihop wireless networks 
with time-varying channels, the optimal solution of the NUM 
problem, which depends on the channel state information 
(CSI) across the network, would also be time-varying. Hence, 
there are some technical challenges which require further 
investigation in order to understand the behavior of these 
distributive iterative algorithms in time-varying CSI situations. 
For example, 

. How to quantify the performance penalty due to 
time-varying channels? Most of the NUM algorithms 
are designed assuming the channel is static during the 
iterations of the algorithms. Hence, it is important to have 
a better understanding of the performance penalty due to 
the time-varying channels. For example, due to the time- 
varying channels, it is not possible to guarantee that the 
problem constraints (such as the power constraints or flow 
balance constraints) will be satisfied at every time slot. 

• What is the cost-performance tradeoff? It is also 
important to study the tradeoff relationship between how 
fast the iterative algorithm updates (the message passing 
overhead), the speed of the time varying channels and 
the performance penalty of the NUM problem. However, 
doing so involves careful analysis of the transient of the 
iterative algorithms, which is highly non-trivial. 
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> How to enhance the existing NUM solutions? Fur- 
thermore, based on the understanding of the performance 
penalty, we are interested to enhance the existing iterative 
NUM algorithms (designed for quasi-static channels) to 
improve the associated tracking performance in time- 
varying channels. 

Due to the stochastic nature of the wireless channel and the 
nonlinear dynamics of the iteration processes encountered, it is 
highly nontrivial to answer the above questions in general 1 11 1. 
There are some preliminary works which study the impacts of 
time-varying inputs to the algorithm. In Kelly's classical work 
lfT2l . the author introduced stochastic perturbations in the al- 
gorithm to represent random load entering the network. In fS], 
the authors studied a stochastic NUM driven by the stochastic 
noisy feedback. However, liT2l and Q did not consider the 
impact of time-varying CSI and their system models have a 
unique equilibrium point. To study the impacts of time-varying 
CSI on distributive algorithms, the authors of ifTTl have studied 
the performance of the distributed scaled gradient projection 
algorithm for tracking the moving Nash Equilibrium (NE) 
of the multicarrier interference game under the finite-state 
Markov channel (FSMC) model, based on randomly switched 
system modeling IfTSl . In lfT4l . the hybrid system model was 
used to study the multicell CDMA interference game. While 
these works ifTTI . lfT4l provide some preliminary results on the 
behavior of the distributed algorithms for solving games under 
time-varying channels, to the best of our knowledge, there is 
no related work studying the behavior of distributed algorithms 
for NUM problems under time-varying CSI. Furthermore, due 
to the decomposition techniques fj], JSJ, the dynamics of 
the distributed algorithms for the NUM problems are quite 
different to those for solving non-cooperative games. On the 
other hand, there are also some works on estimation theory 
concerning parameter tracking in nonstationary environments 
ifTSl . |[T6l . However, the techniques and results of ifTSl . lfT6l . 
which are based on the special linear structure of the underly- 
ing dynamics (e.g., linear regression model and the linear least 
square estimates), cannot be applied to the general distributive 
NUM problems we considered in this paper. 

In this paper, we shall attempt to shed some light on 
the above technical challenges by studying the behavior of 
an iterative primal-dual scaled gradient algorithm (PDSGA) 
im-HOl under time-varying CSI. We model the CSI by a 
finite state Markov chain (FSMC) Ull-llll. Using sample- 
path analysis on the algorithm trajectory, we derived closed- 
form expressions for the algorithm tracking errors, which are 
defined as the difference between the algorithm trajectory and 
the moving target optimal solution of the associated NUM 
problem at a given time. Due to the time varying channels, the 
NUM problem constraints may not be satisfied at every time 
slot and we define such event as constraint outage. We have 
also quantified the probability of constraint outages in terms 
of the speed of the time-varying CSI as well as the constraint 
backojf margins. Based on these results, we shall propose a 
novel algorithm to dynamically adjust the scaling matrix in 
the PDSGA based on the current CSI. The proposed solution 
has low complexity and can be implemented distributively 



at each receiver node utilizing the local CSI only. As an 
illustration, we consider an application example, namely the 
joint multi-commodity flow control and multi-carrier power 
aUocation (MCFC-MCPA) problem H], E). We compare the 
performance of the PDSGA with the proposed dynamic scaling 
matrices against various baseline references. 

The paper is organized as follows. In Section [III we in- 
troduce the general NUM formulation, FSMC model, and an 
example. In Section |III1 we elaborate the primal-dual scaled 
gradient algorithm (PDSGA) and its convergence behavior. In 
Section IIVI we shall derive low complexity solution of the 
adaptive scaling matrices. Section [V] demonstrates the tracking 
performance of the PDSGA using the proposed dynamic 
scaling matrix algorithm and verifies the analytical results by 
simulations. Finally, we conclude with a summary of the main 
results in Section IVll 

Notations: Matrix and vectors are denoted with capitalized 
and small boldface letters, respectively. (a-'^) denotes the 
transpose of matrix (vector) A (a). [AJ^^- denotes the 
entry of matrix A, and 1^^ denotes the Np x Np identity 
matrix. C, and IR+ denote the set of complex numbers and non- 
negative real numbers, respectively. E denotes the operation 
of taking expectation; denotes the operation of Cartesian 
product; a ^ b denotes componentwise comparison. Finally, 
1 denotes a column vector of all ones. 

II. System Model 

In this section, we shall introduce the general network utility 
maximization (NUM) formulation, the finite state Markov 
channel (FSMC) model, as well as the application example. 

A. Network Utility Maximization under Time-varying Chan- 
nels 

Consider a multihop wireless network with K nodes, with 
/C denoting the set of all nodes. We shall first consider 
the following optimization problem with uncoupled utilities 
and coupled constraints |[l]-|I71. We shall illustrate with an 
application example in Section III-CI how this optimization 
problem corresponds to general NUM problems. 

Problem 1 (General Network Utility Maximization Problem): 
At time-slot t, the network-wide resource allocation problem 
is given by 

maxmize .A- (^/t; h(i)) 

subject to g (x; h(/;)) ^ 0, 

where fk (xfc;h(i)) : M.^'' R denotes the concave utility 
function of the fc*'* node; Xfc £ M.^*" is the resource allocation 
vector at the fc*'* node, with Mk dimensions; the vector 
X = [xf x^ ■ • • x^] ^ e M.^ is the collection of all the 
resource allocation vectors {x^j^^j^, with M = X]fc=i ^k', 
g (x; h(0) = [gi (x; h(i)) 92 (x; h(0) • • • 9n^ (x; h(t))]^ 
are concave functions (which are related to the flow-balance 
constraints in the wireless networks), with Nc representing 
the number of inequality constraints; and h(t) denotes the 
collection of the global channel fading coefficients (GCSI) of 
the network. 
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Such optimization problem in ([T]i has been widely studied 
in the literature for quasi-static CSI {h(i)}. For instance, 
the GCSI {h(t)} are assumed to be constants while solving 
the NUM problem in [Tl-Q. A common approach to solve 
Problem [7] (with constant h(t)) is the dual decomposition 
approach 121, |[3l, which is based on the Lagrangian multiplier 
(LM) theory 1201 . Specifically, the Lagrangian of Problem Q] 
can be written as: 



/:(x,A;h(i)) 



K 

E 

k=l 



/fc(xfe;h(0)+A^g(x;h(i)), (2) 



where A are the LM associated with the inequality constraints. 

Before preceding further, we make the following assump- 
tions. 

(Al) /fe(xfc;h(t)) is a twice-differentiable strictly con- 
cave function in x^ >^ 0, and {g^ (x; h(t))}^°^ 
are twice-differentiable concave functions in x >r 

o,Vfc e K. 

(A2) There exits a vector x(h(f;)) >r 0, such that 
g(x;h(t)) ^0. 

(A3) Primal and dual optimal values of Problem Q] can 
be attained ll20l . 
\]vid&i: Assumption (Al), Pro/j/em |7] admits a unique global 
optimal solution x* (h(i)), and Assumption (A2), strong 
duality holds for Problem Q] i.e., the duality gap is zero 
II20I . Moreover, the three assumptions together suggest that 
the Kuhn-Tucker theorem l8l- lfT0l applies for Problem\I] i.e., 
a vector x* (h(t)) >r is an optimal solution of Problem 
m if and only if there is a vector A* (h(t)) >z such that 
(x* (h(t)) , A* (h(t))) is a saddle point of the Lagrangian 
jC (x, A; (h(t))) ISl- lTOl . As a result, solving the concave 
programming Problem \J} and finding the saddle point of 
the Lagrangian £(x, A;h(<)) are equivalent lISl- lTOl . In the 
literature, primal-dual algorithms are widely used to solve 
Problem \T\ We shall discuss this through a specific example 
in Section liLD] 

B. Finite State Markov Channel Model 

In this section, we shall elaborate the statistic model for 
the time-varying CSI. Let h;(i) G Hi denotes the channel 
fading coefficients between the transmitter-receiver pair on the 
l*^ link at time-slot t, V/ G £. Motivated by the simplicity 
of the FSMC model lT7l - lfT9l for time-varying channels, we 
model the channel fading process |hi(t) G ?{;| as an ergodic 

finite-state Markov chain (FSMC), with state space "Hi and 
cardinality \ni\ = Qu G £. For the FSMC {h((t)}, we 
make the following assumptions. 

(A4) Similar to lfT7l - lT9l , the transition probability matrix 



G 



pQiXQi 



of the FSMC {h;(t)} is assumed to 



have the following structure: 
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where = 1 — 2£ and e = O {fdj), with fd and 
T denoting the doppler frequency shift and symbol 
duration, respectively. 
Let h(t) G Ti, denote the collection of all the channel fading 
coefficients (GCSI), then {h(t)} is also an ergodic finite-state 
Markov chain. The state space T-L and transition probability 
matrix T of the FSMC {h(t)} are given by 



Hi, and T 



G 



^QxQ 



(4) 



{les} 



{les} 



where Q = \H\ = Y[{ieE}Qi cardinality of the 

aggregate state space H = {h'^^h'^^-- - , h''3)| of the 
FMSC {h(t)}. For notational convenience, we shall use q G 
Q = {1,2,-- - ,Q} as the indexing variable for enumerating 
the realization of {h(f)} in the rest of the paper. 

C. Joint Multicommodity Flow Control and Multicarrier 
Power Allocation 

As an application example, we consider a joint multi- 
commodity flow control and multicarrier power allocation 
(MCFC-MCPA) problem HI, H in Problem [7] Consider a 
multihop wireless network modeled by a directed graph HI, 
|4J Q ~ {IC,£), where K, is the set of nodes and £ is the set 
of directed edges (i.e. delivering flows for the corresponding 
commodities), with |/C| = K and \£\ = L. Fig. [1] illustrates 
an example wireless network with K = 6 nodes and L = 8 
directed Unks. 




Fig. 1. A specific example of the multihop wireless network with K = 6 
nodes and L = 8 directed hnks, where each of the source nodes (i.e., node 
1, 2, 4, and 5) has two data flows. Destinations of the data flows and the 
routing information are summarized in the Table shown in Fig. |2] If a node 
has multiple outgoing links (e.g., node 1), we assume these links (e.g., Hnk 
1 and link 3) occupy different subbands and do not interfere with each other. 
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Fig. 2. Summary of the destinations of data flows and routing information 
of the example network shown in Fig. [T] 



In our problem, we consider the system to be operating in 
Np > 1 independent subbands between any pair of nodes. For 
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each link, it sees Np channel gain coefficients and we denote 
the CSI as hi{t) = [hii{t) ha ■■■ hiNp (i)]^. We denote the 
power allocation to the l*^^ link as P; = [Pn P12 ■ ■ ■ PiNpf^, 
where P;„ is the transmit power of link I on the n*'' subband. 
Furthermore, to deal with the interference across links, we 
apply multiuser detection (MUD) at each receiving node to 
recover the data from all the interfered links by successively 
canceling the interference. This technique can be applied as 
long as the transmitting rate for each link lies in the capacity 
region Cfc„'. Let Ik denote the set of interfered links at the 
receiving node k. For example, I2 = {1,4}, I3 = {2,6} in 
Fig. [T] Then the capacity region Ckn {P;h{t)) seen by the 
receiving node k on the n"* subband at time-slot t is given 
by ED (Vfc e /C, 1 < n < Np) 



J2 < log 1 



E 



a" I 

where the vector Ckn = [cin C2n ■ ■ ■ C|Xfc|n] denotes the 
achievable data rate vector of all the interfered links at the 
n*'' subband at receiving node k, and denotes the noise 
variance. For example, C2„ = [ci„ C4nf, C3„ = [c2„ cgn]^ 
in Fig. [1] The MCFC-MCPA problem can be formulated as 
follows. 

Problem 2 (MCFC-MCPA under Time-varying CSI): 
Suppose that source node k has Rk commodities (i.e., Rk 
different data flows), the MCFC-MCPA problem is given by 



maximize \^ \^ fks {rks 



(6) 



fe=l ,s=l 



subject to 



=1 



(7) 



E 

Ckn 6 Ckn (P; h{t)) , Vfc e /C, n < Npi^) 

l^Pl < Pk,ma., Vfc G /C, (9) 



where = [r^i rk2 ■■■ VkB^A ^ is the vector of 
data flows of the Rk commodities at source node k; c = 
{c;„, V/ ^ £,1 < n < Np} is the collection of all the auxiliary 
variables which are the achievable rates of the link at the 
n*'' subband; Vk denotes the set of outgoing links- at the fc*'' 
node; Pk.max denotes the per-node sum-power constraint. 

Note that Problem |2] is a special case of Problem 
|7] Specifically, the optimization variables {'X-k}k=i ™ 
Problem Q] correspond to the data rate control variables 
{'"fclfcLi' power control variables {Pi}^^ '■^^ auxil- 
iary variables {ci„,V/ £ 1 < n < Np} in Problem^ with 
Mk = Rk + '^Np\Vk\- Similarly, the concave objective 
functions {/fc (xfc; h(t))}^j^ in Problem |7] correspond to 

1 Sf^i fks{rks) \ in Problem |2] The concave constraint 
functions {gi {x;h{t))}-J:-^ in [7] correspond to the 



'For example, we assume there is MUD at the receiving node 2 so that 
Link 1 and Link 4 do not interfere with each other as shown in Fig. \l\ 

^All the outgoing links operate on different subbands. For example, we 
have Pi = {1, 3} and -P4 = 4, 7 in Fig.[T] 



constraints specified by (|7]i, (O and (|9|i in Problem \2\ with 
Nc = K -\- L -\- J2k=i (2'^''' - !)■ Observe that the capacity 
region given by the above formulation is a convex simplex 
and thanks to the auxiliary variable c/„, which denotes the 
feasible rate at link I and the n*'* subband, the constraints 
are strictly convex. Hence the optimization problem is strictly 
convex, provided that we have a set of strictly concave utiHty 
functions fks- 

D. Primal-Dual Iterative Solution for Quasi-Static CSI 

In the existing literature, Lagrangian primal-dual algorithms 
are widely used for solving convex constrained optimization 
problems. Specifically, the Lagrangian of Problem \2\ is given 
in ( [Tol l, where {J^k,i}^^i ~^ denote all the nonempty subsets 
pf Ik, the collection of all the forwarding links to node k. 
^et vectors x = [P r c]^ and A = [A^^) A^"^) xm^o^ 
lenote the collection of primal variables and dual variables 
respectively. The primal-dual gradient algorithm is given by 

X (< + 1) = [x(i) + aV^C (x(t), A(i); h , (1 1) 

A (t + 1) = [X{t) - aVxC (x(t), A(t); h (*))]+ , (12) 

where a denotes the stepsize; the projection operation, namely 
[a]^ — max{a, 0}, shall be understood componentwisely; 
the vectors \7^C (x(t), X{t); h (t)) and VxC (x(i), A(t); h{t)) 
denote the gradients of the Lagrangian C {■, ■;h.{t)) w.rt. x 
and A, respectively. It has been shown that (x(i), X{t)) in 
(fTTli-(fT2]l converges to (x*,A*) as t — >■ 00 under quasi-static 
GCSI for a strictly convex problem E-QOl, ll20ll . 

E. Distributive Implementation Considerations 

Note that, in Problem^ although the objective function has 
a decomposable structure, the constraints, such as the flow 
balance constraints (jTjl, the link constraints (|8]i and the power 
constraints (|9]i, are coupled among nodes. As a result, the 
computations of iterative updates (fTTI) -(fT2ll require global co- 
ordinates. For distributive implementation, dual decomposition 
methods are commonly used |[T|-||4l to allocate the coupled 
resources by pricing, in which the price levels (Lagrangian 
Multipliers (LM)) control the resources allocated to each 
subproblem and the price levels (LMs) are updated by a master 
problem. By these methods, the original problem (|6]l can be 
decomposed into three separate subproblems. 



K Rk 



91 



(c, A) = max ^ ^ fks{rks) - ^ ^ A^^'V^, 



k=l s=l 

Nf 



911 



, (MUD) 



/g£ n=l 



k: k£Tli i: leJk.^ 



and 

9111W = 



Nf 2'-^kl-i 



max 
{p^o 



^EE E Air' log 1 

' kelC n=l i-l \ 
S.t l^P' ^ Pk,ma. Vft e /C 



leVk 
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K Rk 

r, c, A)=^^A..(rfe,) 

fc=l s=l 

n=l {k,s)eTZi 



K Nf 2l^fcl-l 
V^V^ >{MUD) 

k=l n=l i=l 



loa 1 



E 



E 



(10) 



The master problem is to minimize ^/(c, A) + gii{X) + 
5777(A) with respect to auxiliary variables {c ^ 0} and 
dual variables {A ^ 0}. Note that given the data rates {c} 
and the prices {A}, the maximization problem in gi{c, A) 
is decoupled into each source commodity (fc, s), i.e. it can 
be computed distributively in each source node k. Note 
that to avoid trivial solutions in gij{X), we require a|'^'' ~ 
Efc: ken, lej,., ^1™°*- For 5777(A), since it still has cou- 
pling power constriants, we apply a further dual decomposition 
and obtain second level subproblems as: 



57V (A) 



Nf 2l^kl-l 



r^^^.EE E 4T^iog 

- ' keKn=l i=l 



- E E i"p' (13) 

One remark is that the objective function in the subproblem 
( fT3T l is coupled by the interfering multi-access links towards 
the receiving nodes. However, the subproblem (fT3l l can still 
be solved locally at each receiving node running MUD and 
hence, the subproblem ( fT3] ) can be solved locally. 

The following summarizes the distributive iterative solution 
for solving the MCFC-MCPA problem in Q based on the 
above decompositions. Here we ignore the algorithm initial- 
ization in which all the parameters can take arbitrary values. 
At each time slot t, we implement the following updates 
simultaneously. 

• At each source node k, based on the received message 
a|' the commodity rate r^g is updated by locally solving 



max 



fks{rks) - E A|''Vfcs 

l:(k,s)eni 



(r) 

Each link I calculates and broadcasts c;„and AJ by the 
received message of Aj,'^™^ as 



Q„(t + 1) 



UMUD) 

kni 



Nf 



(0 ~ 7; E ^ E 



rks 



(fe, s)en, 



Each receiving node k updates the power Pj„ to the 
corresponding transmitter ) by iterating one step to solve 



( fT3] ) as 

Pjn{t+1) = 



^.n(t)+«P E^i' 



(MUD) 



\h,n\' 



1 + 



- X 



(MUD) 

kni 



is also updated by minimizing the master 



A 

problem with one step in the iteration 



A 



(MUD) 



kni 



(t+l) 



X 



r(t)-7M|log|l + 



Ej l^jnP^jn 



where Up, 7m > are constants and aSJ^^'^'' together with 
Xj. are quantified by message passing. 
Each transmit node j updates LM X], based on 
messages Pj„ from the receiving node with constant 
value 7p by 



Ar(t+i) 



XP{t)--fpiPk,„,ax - E l^P' 



In the existing literature IJ)-!?), the channel coefficients 
{h/„} (channel state information CSI) are assumed to be 
quasi-static and there is a long enough time for the algorithm 
to converge before the CSI changes. However, in practice, this 
quasi-static assumption is difficult to achieve and the algorithm 
may only be able to iterate very few times before the CSI 
changes, for explicit message passing (such as the Lagrangian 

(r) (P) 

multipliers A; and Xj. ) between nodes are involved in the 
solution above. While the traditional notion of convergence 
discusses whether an iterative algorithm will converge to 
the targets (P^ and r^ J, we need to extend the notion of 
convergence when dealing with time varying CSI because the 
targets P;(h(f;))* and rfcs(h(i))* become moving targets. Fur- 
thermore, the time varying CSI also causes a subtle problem 
in that the constraints of the problem (such as flow balance 
constraints, link constraints and power constraints) may not be 
satisfied with P;(h(t))*, C(„(h(t))* and rfes(h(i))* at every 
time slot and this refers to the constraint outage. The issues of 
convergence w.r.t. moving targets and constraint outage will 
be addressed in the following sections. 

III. Randomly Switched Modeling and 
Convergence Behavior Analysis 

In general, one could use the (traditional) primal-dual gra- 
dient update in (fTTT i and ( fTSl i to solve the NUM problem 
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for quasi-static CSI. The transient behavior of the primal- 
dual update equations in (fTTl i and (fT2l i can be character- 
ized by an algorithm trajectory of an associated nonlinear 
dynamic system and the optimal solution is the equilibrium 
point of the nonlinear system. Since the optimal solution 
(x* (h(t)) , A* (h(t))) depends on the CSI, time-varying CSI 
corresponds to a randomly moving optimal solution (or a 
randomly moving equilibrium point) and the convergence 
behavior of the algorithm can be measured by how well the 
algorithm trajectory could track the moving equilibrium point. 
In this section, we shall first generalize the basic primal- 
dual update algorithm in (fTTI) and (fT2l i for the general NUM 
problem ([T]) under time-varying CSI. We shall then utilize the 
randomly switched system nonlinear control theory to analyze 
the convergence behavior of the generahzed algorithm. 

A. Primal-Dual Scaled Gradient Algorithm under Time- 
Varying CSI 

In this section, we generalize the Arrow-Hurwicz-Uzawa 
type primal-dual gradient method lISl- lfTOl for computing the 
saddle point of the Lagrangian £(x, A;h(t)) under time- 
varying channel conditions. In particular, scaling matrices 
Dx(i) and A{t) are introduced into the primal-dual iterations 
as follow: 

X + T) = [x(t) + Dx (t)Vx/: (x(<), A(t); h (< + T))] + 

(14) 

A (< + T) = [A(t) - A {t)VxC (x(0, X{t); h (i + T))] + 

(15) 

where the integer constant T > 1 denotes the period (in 
terms of time-slots) of updating the resource allocation vec- 
tors x(t) as well as the LM A(<); the symmetric positive 
definite matrix T>^(t) G C*^''^*^^ and A{t) € C^=^^<= 
are the scaling matrices, which are introduced to speed up 
the convergence rate under time-varying CSI; the vectors 
Vx/: (x(t), X{t); h{t + T)) and VxC (x(t), A(i); h{t + T)) 
denote the gradients of the Lagrangian £(•,•; h (t + T)) w.rt. 
x(t) and \{t), respectively. The iterative algorithm in (fT4l l and 
(fI3T l is called the PDSGA. 

Note that the concept of using the scaling matrix to speed up 
the convergence rate of iterative algorithms in quasi-static fad- 
ing channels has appeared in previous literature. For example, 
when Dx(t) diag{a} and A{t) = diag{a}, the iterative 
algorithm in ( fT4b and (fTSl l reduces to the standard Arrow- 
Hurwicz-Uzawa algorithm l8l- lfT0l . On the other hand, in ['1|, 
II22I . the scaling matrices are chosen to be the diagonal entries 
of the inverse Hessian. In f201, |231, the scaling matrices 
are chosen to be the inverse Hessian matrix. In this section, 
we shall focus on analyzing the tracking performance of the 
PDSGA under time-varying CSI for general scaling matrices. 
In Section IIVI we shall propose a low complexity algorithm 
to dynamically adjust the scaling matrix according to the CSI. 

Remark 1: (Tradeoff between Message Passing Overhead 
and Performance) In the iteration process ( fT4] i and ( fTsT i, the 
network nodes update their resource control vectors every T 
time-slots. This parameter T controls the underlying tradeoff 
between message passing overhead and the performance. For 
example, when the updating period T = 1, new resource 



control results will be used in each time-slot at the expense of 
high message passing overhead. On the other hand, when T 
is large, the same resource control vector x.{t) will be applied 
for T time-slots before the next update at time-slot (t + T) 
and hence, this corresponds to low message passing overhead 
at the expense of some performance loss. 

Remark 2 (Impact of the Time-Varying GCSI): As the 
FSMC {h(t)} jumps from one state to another randomly, 
the saddle point (x* (h(t)) , A*(h(t))) of the Lagrangian 
C (x, A; h{t)) also changes with time randomly. As a result, 
the primal-dual scaled gradient iteration process (fT4l i and 
(fTST i would not converge to a single point but rather track the 
moving saddle point (x* (h(t)) , A* {h{t))). The convergence 
property and tracking performance of the PDSGA in (I14t 
and ( fTSl i are the focus of this paper. 

B. Randomly Switched System Modeling 

Randomly switched systems are piecewise deterministic 
stochastic systems, i.e., between any two consecutive switching 
instants, the dynamics are deterministic 113J. Formally, a 
discrete-time randomly switched system is defined as follows 

Definition 1 (Discrete-time Randomly Switched System): 
A discrete-time randomly switched system consists of a family 
of subsystems, and a random switching signal that specifies 
the active subsystem at every time-slot. Mathematically, 

y{t + T) = .^u (y(i)) - when T{t)^ueU, (16) 

where y(t) denotes the system state; (y(0) denotes the 
M*'* subsystem; and t (n) is the switching signal with state 
space U. ■ 
For the FSMC {h(t)}, the channel fading process stays in 
a state h^''^ for a random sojourn time of Tq time-slots, and 
then jumps to another state randomly. During the Tq time- 
slots, the channel fading coefficients {h(f;)} remain constant 
(i.e., h(t) = h^'^^) and the system is deterministic. We thus 
can model the dynamics of the proposed primal-dual scaled 
gradient iteration process (fl4l i and ( fTsT i as a randomly switched 
system, with the FSMC {h(t)} being the switching signal and 
the channel state h'*' corresponding to the g*'' subsystem lfT3l . 
for all q e Q. 

To obtain the dynamics of the randomly switched system 
of the proposed primal-dual scaled gradient iteration process 
(fl4l i and (flST l explicitly, we rewrite iterations given by equation 
(O and (flST l into one vector form, i.e., 

y{t + T) = [y(<)+D(t)V/:(y(<);h(f + r))]+, (17) 

where y{t) = [x{t) X{t)f e M^^^+^'^^ is the collection of 
all the iterates, D(i) = blkdgl{Yi^{t), ^{i)} is the scaling 
matrix for the vector y{t), and V£ (y(t); h (i + T)) denotes 
the overall ^cf/f/oMi gradient'^ of the Lagrangian £(x, A; 

Sojourn time for a switched system is the total time a subsystem spends 
in a state before leaving it. 

"^Note that V£(t) defined in (Ts) is the search direction in the primal-dual 
gradient algorithm. It is, however, not the actual gradient of the Lagrangian 
function C. 
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h{t + T) ), evaluated at (x(i), X{t)), i.e., 



VC (y(t);h(t + r)) 



-VA/:(x(t),A(i);h(i + T) 

= v/:(t). (18) 

Moreover, we denote the equilibrium point of the g*'* 
subsystem as y* = (x*,A*), g G Q. As a result, we end 
up with a randomly switched system with Q subsystems, 
whose dynamics is given by iT% and the Q equilibrium 
points are given by {yq}'^_^- For notational convenience, 
V£ (x(t), A(t); h{t + T)) will be simphfied as VC (t) here- 
after 

Remark 3 (Interpretation of the Equilibrium Points): Note 
that, an equilibrium point of the dynamical system defined in 
([TtT i is a saddle point of the Lagrangian C (x, A; h(t)) defined 
in (|2]i. Moreover, for each equilibrium point y* = (x*, A*), 
the first component x* is the unique global optimal solution 
of the NUM problem ([TJ, when the channel fading process 
{h{t)} is in state h^'), \fq e Q. 

C. Convergence Analysis of the PDSGA 

The region stability is a widely adopted performance mea- 
sure of iterative algorithms in time-varying environments, 
especially for randomly switched and hybrid systems 1 1 1 1. 
Ifltl . Il24l . When the CSI is time-varying (e.g., the FSMC 
model), the equilibrium point of the network is also changing 
and hence, the algorithm trajectory of the primal-dual scaled 
gradient algorithm will not converge to a single point but rather 
a limit region as illustrated in Fig. [3] for the one dimensional 
case. We shall formally define region stability as follows. 

Definition 2: (Region Stability of Randomly Switched Sys- 
tems) A discrete-time randomly switched system with state 
vector y{t) is said to be stable w.r.t. a limit region y, if 
for every trajectory y(t,y(0)), there exists a point of time 
To (y(0)) such that from then on, the trajectory is always in 
the hmit region £. Mathematically, V y (t, y(0)), 3 Tq (y(0)), 
such that y (t, y(0)) £ V t > Tq (y(0)). ■ 



Limit Region 




Fig. 3. An illustration of the region stability of a randomly switched system 
with one-dimensional state space 1241 . From time-slot To and onwards, the 
trajectory goes outside the limit region with probability P-y given in equation 

For analyzing the convergence behavior of the PDSGA in 
([TtT i. we shall first consider the case when the channel fading 
process {h(t)} is in a particular channel state h^''^ for a 
sojourn time of Tq time-slot, Vg S Q. Here the sojourn time 
Tq is defined to be the time period in which the channel 
state h^''-' remains unchanged. Let y* = (x*,A*) denote the 
equilibrium point of the iteration (fTTT i when the channel fading 



process {h(t)} is in state h'^'^^, Vq G Q. We first introduce the 
following lemma, which summarizes the contraction property 
of the PDSGA in 

Lemma 1 (Contraction Property of the PDSGA): Suppose 
that D (h^*?)) is designed in a way such that 



/3 (D(i)) ^ max 
y 



I + D(t)V2/: y;h(« 



< < 1, 



(19) 

where y G M^^''^^'''' is the collection of primal and dual 
variables. For one iterative update under the same channel 
state we have 



(t + T) 



' 9 Il2 



< Pg \\y{t) ~ 

-A'c)x(MH 



-y<i\ 



(20) 



where V2£(y;h(9)) e jg(A/+.v.,.,...T.,e, denotes the 
second-order derivative of the Lagrangian C (y(i); h^*?') w.r.t. 
y, evaluated at y; f3q is referred to as the contraction modulus 
at the g*'' channel state, Vq G Q; and the matrix-2 norm 



is defined as 11 A 11 



lAxll,,, with 



„ = maxr II II ,1 
denoting the standard Euclidean norm. 

Proof: Please refer to Appendix |A] for the proof. ■ 
Remark 4 (Geometric Convergence of the PDSGA): Note 
that the contraction modulus /3q depends on the scaling 
matrix D(t), which may exceed 1 for some inappropriate 
choice of D. Hence, we shall restrict the domain of D such 
that f3q < 1. Table U summarizes the worst-case /3 under the 
adaptive scaling matrix (in Section lIVI i for the 6-node joint 
network flow and power control problem as illustrated in Fig. 
[T] and Fig. |2] Observed from Table U /3g is always less than 
1 for all q and hence, there exists at least one scaling matrix 
control policy that satisfies (T9[ . 

Moreover, since the sojourn time Tg of the channel state 
h'*) is a random variable (which is geometrically distributed 
with parameter v), \/q ^ Q , and the network nodes update 
their resource allocation vectors x(t) and the LM \{t) every T 
time-slots, there may be several updates, or only one update, 
or even no update during the Tg time-slots. The number of 
updates Ng of the PDSGA in ( fTTb during the sojourn time 
Tq of the channel state h^'^^ ^q G Q, is given by Ng = 
=■ , Vg G Q, where [-J denotes the floor function. 



sing Lemma [7] we can derive the region stability of 
the PDSGA in ( fTTb . which is summarized in the following 
theorem. 

Theorem 1 (Region Stability of the PDSGA): Under the 
conditions that (3 {T){t)) < f3g < l,Vg G Q (see equation 
(fT9]l). at steady state (i.e., for sufficiently large t), the 
probability that the iterates y{t) generated by STU being 
inside the limit region y is given by: 



lim Pr{y(t) G 3^} > 1 



min < 1 



f3T 



?1) 



(1-/3) TV 

where /3 = maxjvgeQ} is the maximum contrac 



tion modulus of all channel states; N 



is the average 



sojourn time of the FSMC {h(i)}; and the limit region y is 
given by 

Q 



y 



U{ 

9=1 



y y 



' 9 Il2 



<s 



(22) 
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where 6 ~ niax{vg,reQ.g7^r} \\yq ~y*||2 denotes the maxi- 
mum distance between two equihbrium points in set of all 
equilibrium points {y^jV q G Q}. 

Proof: Please refer to Appendix iBl for the proof. ■ 
Remark 5: (Region Stability versus Average Sojourn Time 
and Update Period) The upper bound given in (|2TI) in Theorem 
[7]depends on the average sojourn time N of the channel fading 
process {h(i)} and the update period T. The average sojourn 
time N can be thought of as an indicator of the channel fading 
rate. The larger the N is, the channel changes more slowly and 
the larger the P^ is. On the other hand, the update period T 
represents the tradeoff between the message passing overhead 
and the tracking performance of the PDSGA in (T% . The 
smaller the update period T is, the greater the message passing 
overhead becomes and the larger the Py is. In particular, we 
have Py = l-0 (T/N). 

D. Asymptotic Analysis of the Tracking Errors 

In this section, we shall derive the closed-form expressions 
of the asymptotic order of growth of the tracking errors 
associated with the PDSGA in ([TtIi under the FSMC channel 
model. Here, we shall study both the expected-absolute-error 
(EAE) and the mean-square-error (MSE). We first give the 
formal definitions of EAE and MSE below. 

Definition 3 (EAE and MSE): The EAE (or MSE) is de- 
fined to be the expectation of the distance (or squared distance) 
between the iterate y{t) and the corresponding target optimal 
solution y* i.e., 

EAE(y(t))=E{h(t)}{||y(i)-y;||2}, 

MSE {y{t)) = E{hW} { ||y(i) - y; , (23) 

where the expectation shall be taken over the stationary 
distribution of the FMSC {h(t)}. 

Note that as the switched system y{t) is driven by switching 
signal CSI h(i) — h^"^', taking expectation over h(t) and 
taking expectation over subsystems y(t) = y* are equivalent 
in (I23I 1. Due to the randomness of the channel fading process 
{h(i)} and the nonlinear dynamics in ([TtI i. it is impossible 
to derive the exact expressions for the tracking errors. How- 
ever, we can derive the asymptotic order of growth of the 
expected-absolute-error EAE (y(t)) and the mean-square-error 
MSE {y{t)), which are actually good enough to design the 
tracking error optimal scaling matrices, as shall be shown in 
the next section. We now summarize the main results of this 
section in the following theorem. 

Theorem 2: (Asymptotic Order of Growth of the EAE and 
the MSE) Under the conditions that f5 (D(i)) < /?q < 1, Vq G 
Q (see equation (fT9]l), the asymptotic order of growth of 
the expected-absolute-error EAE (y(i)) and mean-square-error 
MSE (y(t)) at steady state (i.e., for sufficient large t) are given 
by: 

V {l-p){l-li)N' I 



where (3 = max{vgeQ} {/3 (D(t))} is the maximum contrac- 
tion modulus of all channel states. 

Proof: Please refer to Appendix |C] for the proof. ■ 
Remark 6: (Tracking Errors versus Average Sojourn Time 
and Update Period) The expressions of the asymptotic tracking 
errors EAE and MSE given in Theorem |2] depend on the 
average sojourn time of the channel fading process {h(t)} 
and the update period T. The average sojourn time iV can 
be thought as an indicator of the channel fading rate. The 
larger the N is, the slower the channel changes and the 
smaller the tracking errors are. On the other hand, the update 
period T represents the tradeoff between the message passing 
overhead and the tracking performance of the PDSGA in ( fTTI ). 
The smaller the update period T is, the greater the message 
passing overhead becomes and the smaller the tracking errors 
are. In particular, we have EAE(y(f)) = 0(T/N) and 
MSE{y{t)) = 0{T/N). 

E. Analysis of Constraint Outage Probability and Backoff 
Margin 

Beside the convergence error, the time varying CSI may 
cause constraint outage in which the constraints of the NUM 
problem may not be satisfied at every time slot. To quantify 
this undesirable penalty of time varying CSI, we shall quantify 
the probability of constraint outage in this section. Moreover, 
although the constraint outage is unavoidable due to the time 
varying CSI, we would like to minimize the probability of the 
constraint outage and this can be achieved by introducing a 
constraint backoff margin. Specifically, let K be the constraint 
backoff margin corresponding to the constraint g(x) < 0. 
The following lemma summarizes the probability of constraint 
outage given a constraint backoff margin. 

Lemma 2: (Probability of Constraint Outage with Backoff 
Margin) Assume that there is an inequality constraint g(x) < 
in the NUM problem, where ^(x) : x G D C R" ih^ R 
satisfies conditions (i) g(x) is convex and (ii) g{x) is Lipschitz 
continuous on D with Lipschitz constant c > such that Vx G 
D, 

Ilff(xi)-ff(x2)|| <C||X1-X2||. 

Then, given a backoff margin K > 0, at the steady state (t -> 
00), the constraint outage probability for g{x.) < is bounded 
by 

lim Pv\g(x(t)) > e} < minjl, ^4t^ — 1 (25) 

where /3 < 1 is the maximum contraction modulus of all 
channel states for the optimization variable x(i), 6 is the 
maximum distance between equilibrium points from any two 
adjacent channel states, T is the algorithm update period and 
N is the average sojourn time of the channel states. 

Proof: Please refer to Appendix |D] for the proof. ■ 
Remark 7: (Tradeoff between Performance and Constraint 
Outage) Lemma |2] indicates that there is a tradeoff between 
system performance and constraint outage via the constraint 
backoff margin. For example, when the constraint backoff 
margin K is large, the constraint outage probability is small 
but the performance drops, because the margin restricts the 
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optimization variables to stay within a shrunk optimization 
region and hence excludes some optimal solutions. On the 
other hand, the margin also reduces the outage probability 
that the instantaneous points from the algorithm trajectory go 
beyond the original optimization region (i.e. the feasible set of 
the problem). Note that for the same target constraint outage 
probability, a larger backoff margin is needed for fast changing 
CSI. 

IV. Tracking Error Optimization and Scaling 
Matrix Adaptation 

In the previous section, we have established the region 
stability property of the PDSGA in ( fTTb and derived the order 
of growth of the tracking errors. In this section, we propose 
a low complexity distributive algorithm for determining the 
dynamic scaling matrices {D(i),Vt > 1} to minimize the 
tracking errors. Specifically, we shall show that the scaling 
matrixes shall be adaptive to the time-varying CSI h(t) and 
can be computed distributively by each small group of nodes 
based on the local information only. 

A. Tracking Error Control Problem 

Using the results of the tracking error analysis in (|24] | as 
well as constraint outage probability in dZSl ). the tracking error 
and the constraint outage probability are both shown to be an 
increasing function of the parameter f3 — max{vm} /3 (D(m)), 
where /3(D(m)) denotes the contraction modulus at the m*'' 
stage, which is defined as 



l3 (D (m)) = max 
y 



I + D(to) V^C{y;h{m))\ 



(26) 



where y G R^^^"''^''' is the collection of primal and dual 
variables. As a result, this suggests that the dynamic scaling 
matrices D(to) should be chosen to minimize (3 (D(m)), 
which is a function of the time-varying CSI h(m). This is 
formally cast into the following optimization problem. 



minimize /3(D(m)) 

D(m) 



(27) 



subject to D(m) >- 0,Vto > 1. 



From (l26l l, we can obtain the intuition that the optimal 
solution of (|27l) should be D(to) = -V^/: (y(t); h(m))"\ 
However, the computation of inverse Hessian requires 
global information, which is difficult to be implemented. 
Furthermore, the value of y{t) is not known and the solution 
is not attainable. To facilitate low complexity solution, we first 
take y — y{t) in (l26T l and (3 (D(m)) in (|27] | is replaced by: 

^(D (t + T)) = \\l + -D{t + T)V^C{yityMt))\\^i28) 

where y{t) is the iterate generated by (fTTT i at the t*'^ time- 
slot. To facilitate distributive implementation of the adaptive 
scaling matrix, we partition the whole network into G groups 
of nodes, and partition the collection of optimization variables 
y(*) = [yi(i) I y2(i) I ••• lyclO] accordingly. We then im- 
pose a block diagonal structure to the scaling matrix D(t), 
which is given by 

D (t ) = blkdlg{Dy , (0 , Dy, (i ) , • • • , Dy ) } (29) 



where yi(t), = 1, • • • , G is the collection of variables for 
the i-th group of nodes. A convenient choice for the scaling 
matrix blocks in ^ is Dy^(i) = -VlX{y{t); h{t))-^, 
which can be computed using local information within the 
group of nodes. Note that in wireless communication 
networks, nodes usually have weak connections when they 
are far away and hence, V'^C{y{t), h{t)) is actually a sparse 
matrix. Hence, imposing the block diagonal structure of D(t) 
in (l28l l does not cause too much performance loss as illustrated 
in Fig. HJT] in the simulation section. We shall illustrate the 
design with an example in Section HV-BI 



B. Example: Adaptive Scaling Matrices for the MCFC-MCPA 
Problem 

The Lagrangian of Problem |2] w.rt. the specific network 
topology in Fig. [T] in the MCFC-MCPA example is given by 
(e.g., Nf = 2, Rk = 2,Vfc G {1,2,4,5}): 

Rk 

/:(r,P,c,A)= J2 5]log(l + rfc,) 

fee{l,2,4,5} s=l 
{1,2,4,5} V l£Vk / 



' Np 



Cln - ri2 



\n=l 



+ rn + ri2) 



C4n - rn 



+ >^G^ I E '^6" - (^52 + ^42) I 

+ A^'"' ^Y^Cjn - (^42 

+ >^S^ E '^8n - (»^51 + ''22 + ^'ll) 

+ E ^in """^ i°g (i + ^i^^^i^f^!!^"! 



-X 



(MUD) \ ^ 
kni 2^ "^j"' 



where the sets of interfering links are: Ii = (i.e. empty 
set), I2 = {1,4}, X3 = {2,6}, I4 = {3}, I5 = {5,7}, and 
le = {8}; the vector c = {q„ :1<;<8, l<n< 
2} >z 0; and the vector A is the collection of |aJ'^^|, (aI;^''! 

and lA^^j^'^^j. Observed that interfering transmission Unks 
towards a common receiving node are coupled together via 
the link constraints to achieve a MUD capacity. Hence, we 
can partition the network via links in different MUD blocks. 
Specifically, the adaptive scaling matrix can be chosen with 
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the following structure, 

T){t) = blkdlg{D{MUD,l}(0> D{MUD,2}(0: ' • ' . ^{MVD.K}{t)} 

(30) 

where {MUD,fc} is the collection of variables 
coupled at the MUD node k. For example, 
{MUD,1} = {rii, r22, A*^)}, {MUD,2} - 

^ n ^ A, (MUD) , (r) > (r) x(P)lf„^tU^ 

i'^21, 7^22, ci2, C41, Pi, p4, A2 1 J , Ai , A4 , A2 } tor the 
network realization in Fig. [1] 'D{m.ud. k}{t) is the adaptive 
scaling matrix associated with the collection of variables 
{MUD,fc}. 

Using the solution in ( |29] | and the block diagonal 
structure of D(t) in ( l30b , the adaptive scaling matrix 
at MUD node k is given by D^j^^j^ T) = 



( 



^ {MUD, 



(r(t), P(t), A(t)) j where the computation 
does not require global information but only local information 
from the neighboring nodes. 



V. Numerical Results and Discussions 

In this section, we shall compare the PDSGA with three 
baseline schemes; (I) Bru-PDUA, i.e., primal-dual update al- 
gorithm with scaling matrices obtained by brute-force solving 
the optimization problem ^} with (l26]i; (II) Dia-PDUA, i.e., 
primal-dual update algorithm with diagonal scaling matrices, 
whose diagonal entries are given by the diagonal entries of the 
corresponding Hessian matrix 1231 : (III) Con-PDUA, i.e., the 
regular primal-dual update algorithm with constant stepsize 
^ = 0.005 II23I . We also show the performance and constraint 
outage probability tradoff under different channel fading rates. 
Our simulations are based on the MCFC-MCPA example 
described in Section lTl-Cl and further analyzed in Section lTV-BI 
Fig. [T] shows the network topology of the problem, where 
the network consists of K = 6 nodes and L = 8 directed 
links, and Fig. |2] shows the routing table. We consider one- 
hop data transmission. In addition, the average receiving SNR 
per-subband is 10 dB, the number of subband Np ~ 2, and 
the number of data streams of a source node Rk = 2. 

A. Tracking Performance Comparison 

Fig. |4] illustrates the normalized sum-utility versus time- 
slot index for the PDSGA and the three baseline schemes. As 
illustrated, the PDSGA using the proposed dynamic scaling 
matrix has a much better tracking capability than the baseline 
schemes Con-PDUA and Dia-PDUA, which are designed for 
quasi-static CSI. On the other hand, the PDSGA has similar 
performance as the brute-force solution Bru-PDUA, which 
shows that the approximation in ( l28T l indeed works well. 
We also compare (3 among the proposed PDSGA and the 
three baselines in Table U The results also indicated that our 
algorithm has a better performance than Con-PDUA and Dia- 
PDUA. 

B. Region Stability Property 

Fig. |5] shows the simulation results of region stability. The 
simulation results are consistent with the analytical results 
stated in Theorem [7] i.e., the probability that the algorithm 



TABLE I 

Comparison of Contraction Modulus /3 in Different 
Algorithms over All State of CSI h^'') 



Algorithm 


Average /3 


Worst Case /3 


Bru-PDUA 


0.3116 


0.7862 


Con-PDUA 


0.9904 


0.9998 


Dia-PDUA 


0.9401 


0.9865 


PDSGA witii proposed scaling matrix 


0.5876 


0.8309 



1 ^ 

0.95 rf.'-' 



0.75 
0.7 
0.65 



Target Optims^ 



U-PDUA I - 



1/ ■ 



80 100 120 140 160 180 200 
Time-slot (iteration) index (t) 



Fig. 4. Tracldng performance comparison of the PDSGA and the baseline 
schemes: Bru-PDUA, Dia-PDUA and Con-PDUA, where Bru-PDUA denotes 
the primal-dual update algorithm with scaling matrices obtained by brute- 
force solving the optimization problem I27t with j26t : Dia-PDUA denotes the 
primal-dual update algorithm with diagonal scaling matrices, whose diagonal 
entries are given by the diagonal entries of the corresponding Hessian matrix 
1231 ; and Con-PDUA denotes the piimal-dual update algorithm with constant 
stepsize ^ = 0.005 1231 . The sum-utility is normalized to the maximum 
sum-utility obtained across all the time-slots. 



trajectory at steady state being out of the limit region y (see 
equation (l22l i) is proportional to the normalized update interval 
T/N. Moreover, as the scaling matrices in the PDSGA are 
adaptive to the time-varying CSI, the PDSGA performs better 
than the baseline schemes Con-PDUA and Dia-PDUA. On the 
other hand, the performance of the PDSGA and the brute-force 
solution Bru-PDUA are similar 

C. Order of Growth of the Tracking Errors 

Fig. |6] and Fig. Q show the simulation results of the 
expected-absolute-error (EAE) and the mean-square-error 
(MSE), respectively. Both figures are consistent with the 
analytical results given in Theorem^ i.e., the tracking errors, 
namely EAE and MSE, are proportional to the normalized 
update interval T/N. Moreover, as the scaling matrices in the 
PDSGA are adaptive to the time-varying CSI, the tracking 
errors associated with the PDSGA are much smaller than the 
baseline schemes Con-PDUA and Dia-PDUA. On the other 
hand, the performance of the PDSGA and the brute-force 
solution Bru-PDUA are quite similar 

D. Constraint Outage Probability and Performance Tradeoff 

Fig. [8] shows the tradeoff between constraint outage proba- 
bility and performance under different channel fading rates. 
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Fig. 5. Region stability property of the PDSGA and the baseline schemes: 
Bru-PDUA, Dia-PDUA and Con-PDUA, where Bru-PDUA denotes the 
piimal-dual update algoiithm with scaling matrices obtained by brute-force 
solving the optimization problem j27t with (26); Dia-PDUA denotes the 
primal-dual update algorithm with diagonal scaling matrices, whose diagonal 
entries are given by the diagonal entries of the corresponding Hessian matrix 
1231 ; and Con-PDUA denotes the primal-dual update algorithm with constant 
stepsize t; = 0.005 1231 . The asymptotic order of growth refers to the results 
stated in T/ieorem [7] (see equation )2U ).Here PRegion = 1 ~ ^y- 




0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 
Normalized Update Interval (T/7V) 

Fig. 6. Expected-absolute-error (EAE) versus Normalized Update Interval 
of the PDSGA and the baseline schemes: Bru-PDUA, Dia-PDUA and Con- 
PDUA 1231 . The asymptotic order of growth refers to the results stated in 
Theorem\2\(see equation )24t ). The EAE is normahzed to the maximum EAE 
of all the schemes. 




0.15 0.2 0.25 0.3 0.35 
Normalized Update Interval {TfN) 



Fig. 7. Mean-square-error (MSE) versus Normalized Update Interval of the 
PDSGA and the basehne schemes: Bru-PDUA, Dia-PDUA and Con-PDUA 
1231 . The asymptotic order of growth refers to the results stated in Theorem 
|2] (see equation ilAV ). The MSE is normalized to the maximum MSE of all 
the schemes. 




0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.1 
rJSE tor the tracking error 



Fig. 8. Constraint Outage Probability versus Mean Square Error (MSE). 
The curves show a tradeoff between the Constraint Outage Probability and 
the performance in terms of the MSE under different channel fading rates 
which are in terms of f/N. 



For slow varying CSI, (i.e. T/N is small), it is easy to 
maintain a low constraint outage probability while achieve a 
good performance. However, for fast varying CSI, (i.e. T/N 
is large), the performance has to be sacrificed to meet with 
some target constraint outage probability by introducing a 
large backoff margin according to equation ( |25] ). 

VI. Conclusions 

In this paper, we have analyzed the convergence behavior of 
a primal-dual scaled gradient algorithm (PDSGA) that solves 
a network utility maximization (NUM) problem under time- 
varying fading channels. We have shown that the general 



PDSGA converges to a limit region rather than a single 
point under the FSMC model. Furthermore, the asymptotic 
tracking errors, namely the expected-absolute-error (EAE) and 
the mean-square-error (MSE), are both 0(T/N). We have 
also analyzed the constraint outage probability due to time 
varying CSI and introduced a backoff margin to the constraint 
to reduce the outage probability. Based on these analysis, we 
have proposed a novel low complexity distributive algorithm 
to dynamically adjust the scaling matrix based on the time 
varying CSI in the PDSGA. Simulations have been carried 
out to verify the analytical results as well as to demonstrate 
the superior performance of the PDSGA over the baseline 
schemes. 
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Appendix A 
Proof of Lemma [7] 

To prove Lemma [7] namely the contraction property of the 
PDSGA, we start with the left hand side (LHS) of equation 
( l20l i directly, i.e.. 



\y{t + T) 



' 9 Il2 



' 9 Il2 



y(t)+D(h(''))v/: (y(t);h 

D (h(«)) he (y(i); h(«)) - VC (y^; hf?) 



< ll(yW-y; 



(a) 

where < follows from the non- expansive property of the 
projection [•]+ operation ||231 . Moreover, using mean- value 
theorem |23J, 



|V£ (y(t);h(«)) -V/: [yl-M^^ 

= ||v2/:(y;h(«)) ||||y(t)-y*||, 



denotes the 



where C {mM"^) e 
second-order derivative of the Lagrangian £(y(t); h(«)) w.rt. 
y, evaluated at y{t)\ and y{t) is given by y{t) = y{t) + (1 — 
vj)y*^, with w G [0, 1]. Hence we obtain 

l|y(*+^)-y;ll2 

'l + D(h(^)) (y;h(9))) (y(t) - yj 



< 



I + D (h(9)) V^/: (y; h(«)) II J|y(0 - y, 



' 9 Il2 



< ||y(i) - y, 



q\\2 ■ 



where < follows from the property of the standard matrix- 2 

norm; and < follows from the contraction condition given in 
equation ( fT9b . 



Appendix B 
Proof of Theorem [7] 

Consider a time interval [0, T] with n switchings, i.e., 
there are n stages in the interval [0, T]. Let {qi, 92, ■ • • ,qn}, 
{Ti, T2, ■ • • , T„} and {/3i, 132, ■ ■ ■ , M denote the channel 
states, (random) sojourn times and contraction modulus of the 
n stages, respectively, as shown in Fig. |9] 

Under the conditions that (3 (D(m)) < 1, the iteration ( flTl i 
is contraction mapping in the m*^ stage (1 < m < n) 1231 . 
As a result, the distances {||e(m)||2 , 1 < to < n} between the 
iterate y(m) and the target optimal solution at the end of each 
stage can be bounded by 



||e(l)||2<||y(0)||/3^; 
||e(m)||2<||e(m- 1)112/3,^'" +<5,„_i, 



(31) 



where (5„i_i.„i denotes the distance between the target optimal 
solution of the (m — 1)*'' stage and m*'' stage, i.e., the jump of 
the equilibrium point of the switched system; and Nm denotes 



the number of updates during the m*'* stage. Iterating equation 
(I3TI 1 from TO = 1 to TO = n, we can get 



<n)h<{[\\ymWi'+Si.2] 0^^+62,3}^^' 
= l|y(0)|in/3^'"+En<5. 



7n—l 

<l|y(o)ll/3 



n n 

rn — 1 , m 

l — l m—l 

n n 

E;;=i I TT aN. 



1=2 m=l 



(32) 



where S = max„i{Sm~i.m} is the maximum distance between 
any two equilibrium points. 

Let q = [91 (72 • • ■ Qn] and T = [Ti T2 ■■■ r„], then 
take expectation of both sides of equation (|32] | w.r.t. {q, F}, 
we can get 



E 



KF} {||e(n)||J < E^q^r} {y(0)/3^--^-} 

C 71 71 \ 



/— 2 m— / 



Under the assumption that [T] 



1^ ,V q e Q (see 



equation Q), then we know that Ti,T2, - ■ ■ , r„ are identically 
distributed with probability mass function (PMF) given by 

Pr {N„, = 1}^ z^^'('-i) (1 - 1^"^") , V Z = 1, 2, ■ • • . 

Then, we have 



t 1=2 m=l J II 1=2 m=l 



Nrr 



1-P 

where p = E{^^'"} = 
let n — > +00, we get 



E 



5a 



{q.r} {||e(oo)||2} < — 



(34) 
Therefore, 



a il-/3)N 



^- (35) 



Then, by virtue of the Markov inequality we can 

get Pr{||e(^)||2><5} < 

(if J)lv' which means 
lim„^+oo Pr{y(i) G 3^} > 1 - min {l, j^}- 



Appendix C 
Proof of Theorem |2] 

As in Appendix^ we consider a time interval [0, T] with 
n switchings, i.e., there are n stages in the interval [0, T]. 
Let {(?i,q2,-- - ,9n}, {Ti,r2,--- ,r„} and {/3i,/32,-- - ,/?«} 
denote the channel states, (random) sojourn times and contrac- 
tion modulus of the n stages, respectively, as shown in Fig. 
|9] From equation (l35T l in Appendix\B\we know that, at steady 
state (i.e., when t +00), EAE(y(<)) < j^^^- 
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Fig. 9. An illustration of the concept stage of channel states and the 
associated target optimal solution in one dimensional space 1111 . The length 
Tm of the m*'' stage is a random variable, which can be one or several 
time-slots ; and the number of updates during the m*'' stage is denoted as 
nm \ the distance between the target optimal solution of the m*'" stage yj^ 
and the target optimal solution of the (m + 1)"' stage y^^^^j is denoted 

as Sm,m+l \ and the contraction modulus of in the m*'' stage is denoted as 
Pm, Vm > 1 pTl. 



Let7? = E{/32"™} = _i-^ 
have 



/32 + (l-/32)7v 



we then 



Er 




1=2 ni=l 




1—2 7n—l 



1- p ri- p \ l-?7 

Combining the above equation with equation ( l33l l. and letting 
n +00, we can get 



MSE(y(t)) < 



(52/^2 (2/3 + T(l - /3)7V) 
(l-/32)(l-/3)]v' 



Appendix D 
Proof of Lemma |2] 

To introduce a backoff margin K to reduce the constraint 
outage probabiHty, instead of keeping the constraint (/(x) < 0, 
we form the constraint as 

5(x) + i^ <0. 

Hence at the equilibrium point x*(t), it is satisfied that 
g{'x*{t)) < —K and we have the relationship 

g(x(t)) > e ^ g(x(t)) - .g(x*(t)) >e + K, 

which implies that 

Pr{.g(x(t)) > e} < Pr{.g(x(0) - g(x*(t)) >e + K}. (36) 

From condition (ii), since \\g{x{t)) — g{x*{t))\\ < c||x(t) — 
x*(t)||, we obtain 

Pr{g{^{t))-g{K*{t))>e + K} 

<Pr{||.g(xW)-.g(x*(t))||>e + A'} 

<Pr{||x(t)-x*(t)||>i±:^} (37) 



Taking the intermediate result from the proof of Theorem [7] 
(see equation ([35])). we finally get 

lim Pr{||x(i) - x*(i)|| > iiA} < ^ 



Combining (|36]|-(|38]| we complete the proof. 



(38) 
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